Anticodon table of the chloroplast genome and identification of putative quadruplet anticodons in chloroplast tRNAs

The chloroplast genome of 5959 species was analyzed to construct the anticodon table of the chloroplast genome. Analysis of the chloroplast transfer ribonucleic acid (tRNA) revealed the presence of a putative quadruplet anticodon containing tRNAs in the chloroplast genome. The tRNAs with putative quadruplet anticodons were UAUG, UGGG, AUAA, GCUA, and GUUA, where the GUUA anticodon putatively encoded tRNAAsn. The study also revealed the complete absence of tRNA genes containing ACU, CUG, GCG, CUC, CCC, and CGG anticodons in the chloroplast genome from the species studied so far. The chloroplast genome was also found to encode tRNAs encoding N-formylmethionine (fMet), Ile2, selenocysteine, and pyrrolysine. The chloroplast genomes of mycoparasitic and heterotrophic plants have had heavy losses of tRNA genes. Furthermore, the chloroplast genome was also found to encode putative spacer tRNA, tRNA fragments (tRFs), tRNA-derived, stress-induced RNA (tiRNAs), and the group I introns. An evolutionary analysis revealed that chloroplast tRNAs had evolved via multiple common ancestors and the GC% had more influence toward encoding the tRNA number in the chloroplast genome than the genome size.

The origin of the genetic code and the translation event is major transition points in evolutionary biology. The triplet genetic code is hailed as one of the most important and ultimate evolutionary anchors and indisputable evidence of life. The triplet genetic code understands the specific assignment of the amino acids in the translation machinery Cells use the universal manual and a guided dictionary to translate the corresponding amino acids into the translating protein. The number of codon combinations on the mRNA can be an astounding number of feasible protein sequences, from which only a few can be found in nature. The triplet genetic code is believed to be universal and degenerate and accommodates twenty essential amino acids using sixty-one sense and three-stop codons. However, an emerging study has proved that the "Universal Genetic Code" is no longer universal and can be called as canonical 1,2 . Sometimes nature enhances the protein functionalities through codon reassignment to incorporate new amino acids. This has led to the discovery of the role of selenocysteine (Sec) and pyrrolysine (Pyl) amino acids in the protein through the assignment of the stop codon as the sense codon. However, sense codon reassignment requires low-frequency codons, so stop codons are used for this purpose. It has been demonstrated that except for the triple codons, the Escherichia coli ribosome can accommodate codons and anticodons of variable sizes 3 .
Taking this opportunity, they have translated four base codon pairs, CCCU, AGGA, UAGA, and CUAG, using the four base anticodons 3 . Frame-shifting of the + 1 nucleotide is most favorable in the absence of suppressor tRNA in E. coli 3 . The study also reveals that frame maintenance during translation is not absolute, and mutant tRNA can promote a frameshift, which can occur with high frequencies at the programmed site of the mRNA 4,5 . Riddle and Carbon (1973) reported the presence of four base anticodons CCCC in tRNA Gly instead of the wildtype CCC 6 . A study by Mohanta et al. 14 revealed the presence of nine nucleotide anticodons instead of seven nucleotides 7 . These features in the tRNAs certainly explain the existence of extended codons and anticodons. Most likely, these evolutionary scenarios exist in codons and tRNAs to meet the novel translational demand.
The availability of enormous genome sequencing data is quite valuable for digging deep into the molecular features of the protein translation machinery. Significant studies have been performed in the field of codons Chloroplast genome encodes putative duplet and quadruplet anticodons. We have already mentioned that the triplet genetic code is not universal; it is canonical. Therefore, the genome might have sup- www.nature.com/scientificreports/ pressed or extended the genetic code, which is yet to be elucidated, to a greater extent. Our study found that the chloroplast genome encodes the putative duplet and quadruplet anticodons (Supplementary Files 4,5). The annotation of tRNA with quadruplet anticodon was found when chloroplast genomes were annotated in the GeSeq chlorobox (https:// chlor obox. mpimp-golm. mpg. de/ geseq. html). However, re-analysis of the tRNA with the quadruplet anticodon in tRNAscan-SE did not result in a tRNA with a quadruplet anticodon, which might be due to the default setting for identification of a tRNA with a triplet anticodon. We are the first to report the presence of duplet and quadruplet anticodons in the chloroplast genome of the plant kingdom. We found that at least 91 species were encoded quadruplet anticodons (Supplementary File 4  Chloroplast genome encodes putative spacer tRNAs. Spacer RNA genes are usually found in bacterial genomes in the spacer region between the 16S and 23S rRNAs. When we focused our study on the spacer RNA in the chloroplast genome, we found that chloroplast genomes were also encoded in the putative spacer tRNAs between the 16 and 23S rRNA genes. tRNA Ala (UGC) and tRNA Ile (GAU) were the most predominant spacer tRNAs found in the chloroplast genome (Fig. 2). The percentages of the UCG and GAU anticodons in the chloroplast genome were 5.13 and 4.98, respectively. This showed that spacer tRNAs were more common in the chloroplast genome. Sometimes, it contained tRNA fMet (CAU) and tRNA Ser (GCU) in the spacer region. All the chloroplast genomes did not encode the spacer tRNAs (Supplementary File 7). None of the mycoparasitic plants was found to encode the putative spacer tRNA in their chloroplast genome. However, the majority of the species encoded putative spacer tRNAs.
The majority of chloroplast tRNAs encode group I intron. It was found that the majority of chloroplast-encoding tRNAs encode introns. Except for tRNA Arg , tRNA Asn , tRNA Asp , tRNA Gln , tRNA His , tRNA Pro , tRNA Trp , and tRNA Val all other tRNA genes were found to contain group I introns ( Table 2). The introns found in tRNA seem to be isotype-specific ( Table 2). The introns are conserved within the tRNA isotype, and the conserved nucleotide sequences of the introns of one isotype do not match with the conserved introns of other isotypes ( Table 2). When we cluster the conserved region of the introns, they form four groups (Supplementary Figure 1). We have named them groups A, B, C, and D. Group A contains tRNA Leu , tRNA Tyr , and tRNA Cys ; group B contains tRNA Ser ; group C contains tRNA Lys , tRNA Met , and tRNA Ala , and group D contains tRNA Gly , tRNA Ile , tRNA Glu , and tRNA Thr (Supplementary Figure 1). However, the introns of tRNA Phe do not group with any other introns (Supplementary Figure 1).
Chloroplast genome encodes putative novel tRNAs. Although we all are well-acquainted with the fact that tRNA makes a clover leaf-like structure, we found some variations in the tRNA structure. Analysis revealed the presence of a few novel tRNA structure/tRNA-like molecules ( Fig. 3 and Fig. 4). Some putative novel tRNA-like structures seemed to lack the anticodon loop, whereas, in some cases, they had extra sequences near the anticodon arm region (Fig. 3). A tRNA-like structure contained an extended nucleotide sequence in the www.nature.com/scientificreports/ region between the D-arm and anticodon arm (Fig. 4). At least 42 species were found to encode novel tRNAlike structures that contained extended nucleotide sequences between the D-arm and anticodon arm (Fig. 4). Furthermore, a few tRNAs were found to have lost the pseudouridine loop (Fig. 5), suggesting the presence of novel tRNAs/tRNA-like structures in the chloroplast genome.
Chloroplast genome encodes putative tRNA Fragments (tRFs). Analysis revealed the presence of at least 55 tRFs in the chloroplast genome. The tRFs are small 14-32 nucleotides novel class of small, non-coding RNAs, derived from the mature or precursor tRNAs that are different from the tRNA-derived, stress-induced tRNAs (tiRNAs) 8,9 . The tRFs found were for tRNA Glu , tRNA Arg , tRNA Gly , tRNA His , tRNA Val , tRNA Ile , tRNA Thr , tRNA Leu , tRNA Lys , and tRNA Ala (Supplementary File 8). The tRFs of tRNA Glu were found to contain conserved nucleotide sequence GGC CTT ATC GTC TAG TGA T, whereas those of tRNA Gly were found to contain conserved GCG GGT ATA GTT TAG TGG TAAA nucleotides (Supplementary File 8). As such, we did not find conserved nucleotide sequences for the other tRFs. The tRFs of tRNA Ala , tRNA Gly , tRNA Ile , tRNA Lys , and tRNA Leu were 5ˈ-tRFs, whereas the tRFs of tRNA His , tRNA Thr , and tRNA Val were 3ˈ-tRFs. The tRFs of tRNA Glu did not match either the 5ˈ-or 3ˈ-end of the tRNA, which might have originated from the precursor tRNA transcript. Therefore, they can be classified as tRF-1.  Table 2. Conservation of introns in chloroplast tRNAs. From the mentioned tRNA isotypes, at least eight isotypes do not encode any intron in their tRNA genes or its not conserved. www.nature.com/scientificreports/ Chloroplast genome encodes putative tiRNAs. The longer tRFs (tRNA fragments) of 30-50 nucleotide-long sequences are called tRNA-derived, stress-induced RNAs (tiRNAs) 8 . Therefore, we searched for the presence of 30-50 nucleotide tRFs. We found at least 244 tRNA sequences, which encoded the 30- Machine machine-learning approach showed GC% influences the tRNA number in the chloroplast genome. We grouped the chloroplast genomes of all the species according to their clade and conducted a comparative study. The analysis revealed that the average tRNA gene number in monocot (37.80%)   (Fig. 6). The chloroplast genomes of the species Isolepis setacea and Vitis romanetii were found to encode the highest number of tRNAs, that is, 52 each (Supplementary File 6). On average, the chloroplast genomes were found to encode 36 tRNA genes per genome. A machine-learning approach was used to understand the role of the GC content and genome size in the tRNA number in the chloroplast genome. The boosting analysis revealed that the relative influence of the GC% was more than the genome size (Fig. 7). A principal component analysis was conducted to see their association with different clades.

tRNA Isotype Conserved Consensus sequence of Intron
Chloroplast tRNAs evolve from multiple common ancestors. We conducted a phylogenetic analysis by considering the tRNA genes of the chloroplast genome. The phylogenetic analysis revealed clear and distinct phylogenetic clusters of tRNAs. The phylogenetic tree showed two major distinct clusters suggesting their origin from multiple common ancestors (  (Fig. 9). Genes undergo mutation, which is a common phenomenon. Although it was a common phenomenon in coding genes, non-coding genes also showed frequent mutation. Therefore, a transition/transversion bias study was conducted for the chloroplast tRNAs. The analysis revealed that transition predominates transversion (Supplementary Table 2). The transition/transversion bias was found to be the highest for tRNA Asn (R = 13.71), whereas tRNA Ser (1.22) had the lowest bias (Supplementary Table 2). The transition/transversion bias of tRNA Asn was followed by tRNA Tyr (11.51) and tRNA Trp (8.63). Although tRNA Arg , tRNA Leu , and tRNA Ser encoded six Isoacceptors, their transition/transversion bias was comparatively lower than others (Supplementary Table 2).

Discussion
The chloroplast genome harbors several coding and non-coding sequences, including rRNA and tRNA 10 . These genetic elements and their potential to translate codons make them semi-autonomous organelles of the plant cell. A detailed genomic analysis of the chloroplast tRNA reveals that it does not encode all the 64 anticodons required for the tRNAs. The tRNAs with anticodons ACU (tRNA Ser ), CUG (tRNA Gln ), GCG (tRNA Arg ), CUC (tRNA Glu ), CCC (tRNA Gly ), and CGG (tRNA Pro ), are absent in the chloroplast genome of the studied species. Therefore, these anticodons can be classified as rare anticodons of the chloroplast genome. The ACU anticodon of tRNA Ser and the GCG anticodon of tRNA Arg are from the hexa-isoacceptor group. In contrast, the CCC anticodon www.nature.com/scientificreports/ of tRNA Gly and the CGG anticodon of tRNA Pro are from the tetra-isoacceptor group. Therefore, a lack of these anticodons from their isoacceptor group does not make any difference in the genome, as other isoacceptors are available for their use to encode the codon. However, tRNA Gln is encoded only by CUG and UUG anticodons, whereas tRNA Glu is encoded by the CUC and UUC anticodons. The lack of the CUG anticodon from tRNA Gln and the CUC anticodon from tRNA Glu in the chloroplast genome has left these tRNA isotypes with only one choice of anticodon (Table 1). The lack of the CUG anticodon in tRNA Gln and the CUC anticodon in tRNA Glu in the chloroplast genome may be due to a strong selection pressure to establish UUG (tRNA Gln ) and UUC (tRNA Glu ) anticodons as the dominant anticodons. The tRNA anticodons followed by nucleotides CUx (x = any nucleotide) may have undergone a strong evolutionary pressure. Hence anticodons CUA, CUU, CUG, and CUC, encode only 197, 7, 0, and 0 anticodons, respectively, in the chloroplast genome (Table 1). However, the CAU anticodon encoding tRNA Met has been seen to have the highest percentage (5.47%) in the chloroplast genome (Supplementary File 1). The CAU anticodon of tRNA Met , of the nuclear-encoded genome has also been found in the highest (5.03%) abundance 11 , thus corroborating CAU, as the most abundant anticodon in the nuclear and chloroplast genomes. The anticodons CAU (tRNA Met ), GUU (tRNA Asn ), UGC (tRNA Ala ), and ACG (tRNA Arg ) have been found to encode more than 5% each of the total anticodons, suggesting the role of positive selection pressure in these anticodons (Supplementary File 1). However, at the isotype/isodecoder level, tRNA Leu (10.27%) has been found to contain the highest percentage of anticodons, followed by tRNA Ile (9.93%) and tRNA Arg (7.96%) (Supplementary File 1). A similar level of abundance has been found for tRNA Leu (7.80%) for the nuclear-encoded tRNA genes, reflecting a similarity in the anticodon abundance in the nuclear and chloroplast genomes 11 . However, an abundance of the nuclear-encoded anticodons tRNA Leu is followed by tRNA Ser (7.66%), tRNA Gly (7.52%), and tRNA Arg (7.28%) 11 . Although tRNA Leu is the highest encoding isotype/isodecoder in nuclear-(7.80%) and chloroplast (10.27%)-encoded genomes, there is a great difference in their percentage. The chloroplast-encoded CAU anticodon also encodes tRNA Ile2 (4.93%). The CAU anticodon for tRNA fMet (0.33%) is also relatively abundant in the chloroplast genome. The tRNA fMet acts as an initiation anticodon in protein synthesis in mitochondria, bacteria, and chloroplasts, and the presence of tRNA fMet in the chloroplast genome is quite justified. However, only 709 tRNA fMet genes were found during the analysis suggesting that tRNA fMet is not a universal tRNA of the chloroplast genome. A majority percentage of the chloroplast genome does not encode tRNA fMet . A few chloroplast genomes encode the tRNAs for selenocysteine and pyrrolysine amino acids (Table 1). However, Zhao et al. 12 has reported the absence of tRNA Sec in gymnosperm plants. The Sec amino acid specified by the UGA codon requires the presence of the selenocysteine insertion sequence (SECIS) element, and the Pyl amino acid  www.nature.com/scientificreports/ encoded by the UAG codon requires the pyrrolysine insertion sequence (PYLIS) 13 . The presence of tRNA for encoding Sec and Pyl reflects that the chloroplast genome may have SECIS and PYLIS. It was also very peculiar to see the loss of tRNA genes in the chloroplast genome of heterotrophic and mycoparasitic plants. Our previous study reported the loss of several other genes in the chloroplast genome in mycoparasitic and heterotrophic plants 14 . A similar is true for the tRNA genes as well. In the absence of tRNA genes in the chloroplast genome, the cell most probably uses the tRNA genes from the nuclear-encoded genome. However, the loss of tRNA genes in the chloroplast genome seems independent of the nuclear genome. The parasitic and heterotrophic plants require less effort to complete their lifecycle, as they are completely dependent on their host. Hence, they do not need a lot of genes for their function and hence, may be under constant pressure to eliminate genes. Therefore, these mycoparasitic and heterotrophic plants contain only 14 (CAU, CCA, GAA, GCA, GCU, GUA, GUC, GUG, GUU UCC, UCU, UUC, UUG, and UUU) anticodons in their chloroplast genome.
It is well known that the triplet genetic code is canonical and not universal 15 . The genetic code can be expanded, where specific codons can be reallocated to encode non-proteogenic amino acids. The tRNA genes undergo rapid changes to meet the translational demand of the cell 16 . Therefore, it is highly possible that tRNA can expand its anticodon nucleotide number. Our study helped us to discover the presence of quadruplet anticodons in the chloroplast genome of 91 plant species (Supplementary File 4). The quadruplet anticodons found in our study were UAUG, UGGG, AUAA, GCUA, and GUUA. Studies regarding functional quadruplet anticodons are reported in a few cases [17][18][19][20][21][22][23][24][25] . DeBenedictis et al., (2022) reported the translation of four-base codons in natural and synthetic systems 25 . They reported 20 isoacceptors can be changed to functional quadruplet tRNAs 25 . Anderson et al. 17 reported the role of the quadruplet codon AGGA through changes in the tRNA anticodon loop to CUU CCU AAA in a suppressor tRNA CUA . The suppression of the amber tRNA led to the encoding of homoglutamine (hGln), using the AGGA codon 17 . They also reported that quadruplet codons CCCU or CUAG could suppress the amber tRNA and allow the incorporation of unnatural amino acids into the protein in Escherichia coli 17 . Neumann et al. 18 , reported the encoding of unnatural amino acids through the evolution of the quadruplet anticodon in response to the amber codon tRNA CUA . Chloramphenicol resistance was achieved when tRNA UCUU Ser2 translated the AAGA codon, and tRNA UCCU Ser2 translated the AGGA codon 18 . Niu et al. 19 replaced tRNA Pyl CUA with the UCCU anticodon and generated tRNA Pyl UCCU , which recognized and suppressed the quadruplet codon AGGA. This provided a qualitative notion for suppressing the quadruplet codon through tRNA UCCU 19 . Most specifically, the presence of the quadruplet anticodon was associated with the suppression of the amber tRNA and the incorporation of the unnatural amino acid into the protein chain. The tRNA GCUA contained an additional G nucleotide prior to the tRNA CUA anticodon, suggesting its role in the suppression of  Figure 7. Machine-learning analysis of GC % content and genome size in the tRNA gene number in the chloroplast genome; the random forest approach was used to run the analysis. In the study, from 5959 species, 3814 species were used as training sets, 954 for validation, and 1191 as test sets. All the analysis was conducted at p < 0.05. Analysis revealed that the GC% content had more influence toward the number of tRNA gene numbers than the genome size. www.nature.com/scientificreports/ the amber codon. In the tRNA Asn GUUA anticodon, most probably, nucleotide A was incorporated after the GUU anticodon, as the tRNA with the GUU anticodon was grouped with the GUUA anticodon in the phylogenetic tree (Fig. 9). Similarly, in the UGGG anticodon, the G nucleotide got incorporated in the UGG anticodon, as they grouped with the UGG anticodon (Fig. 9). The GCUA anticodon was grouped with GCU anticodon, suggesting that the A nucleotide was incorporated at the fourth position of the GCU anticodon, which gave rise to the GCUA anticodon (Fig. 9). However, no such clue was found in the case of the UAUG and AUAA anticodons. Considering the incorporation of the additional nucleotide at the fourth position, we could speculate that the G nucleotide was most probably incorporated in the UAU anticodon and gave rise to the UAUG anticodon. Similarly, the A nucleotide was incorporated at the fourth position of the AUA anticodon to give rise to the AUAA anticodon. Although we found only five putative quadruplet anticodons, the genome could accommodate at least 256 quadruplet anticodons/codons in the cell (Table 2). We also found the presence of tRNAs, with only duplet anticodon, where one nucleotide was possibly deleted from the anticodon (Supplementary File 5). At least 13 species contained duplet anticodons in the tRNA of the chloroplast genome (Supplementary File 5).
The chloroplast encoding tRNAs were also found to encode the group I introns. These group I introns was conserved in their respective isotype/isodecoder groups (Table 2). From a total of 20 isotypes, 12 of them were found to encode the group I introns (Table 2). However, the group I intron of one isotype was not conserved with the intron of another isotype, reflecting the isotype-based conservation of the group I intron, in the tRNA.
It is well-reported that group I introns are found in tRNAs, bacteria, lower eukaryotes, and higher plants [26][27][28] . Some of the group I intron encode homing endonucleases catalyze intron mobility, thus facilitating the movement of the intron from one location to another and from one organism to another 27 . However, the incorporation of the group I intron in the tRNA gene is isotype-specific, as only 12 isotypes have been found to encode the intron, while eight isotypes do not have any intron in their tRNAs (Table 2). From the eight isotypes, tRNA His , tRNA Gln , tRNA Asp , tRNA Asn , and tRNA Arg belong to the polar group, whereas, tRNA Trp , tRNA Pro , and tRNA Val belong to the non-polar group. This shows that the presence of the type I intron tends to be more toward the tRNA that encodes polar amino acids. Furthermore, it is seen that the chloroplast genome also encodes the putative spacer tRNAs (Fig. 2). It is reported that E. coli contains a spacer tRNA (tRNA Ala and tRNA Ile ) that is present in the spacer region of the 16S and 23S rRNA 29 . The tRNAs, tRNA Ala , and tRNA Ile , have also been found in the spacer region of 16S and 23S rRNA, suggesting the presence of a spacer tRNA in the chloroplast genome. Although, in a majority of cases, tRNA Ala and tRNA Ile are the predominant spacer tRNAs; tRNA Glu can be the third most possible spacer tRNA of the chloroplast genome.
The analysis also revealed the presence of tRNA fragments (tRFs) in the chloroplast genome. We found at least 55 tRFs that belonged to ten tRNA isotypes (Supplementary File 8). These tRFs were putatively derived Figure 8. Phylogenetic tree of chloroplast tRNAs. The phylogenetic tree shows two distinct major clusters named clusters I and II. The phylogenetic tree shows that chloroplast tRNAs have evolved from multiple common ancestors. In cluster I, anticodons GCC, CGU, CGA, UCU, CAA, and UAG, are found in more than one group, and the anticodons GCC, GCU, UUC, CAU, and GAA are found in both clusters, showing their evolution via duplication. The phylogenetic tree has been constructed using the neighbor-joining method using the Clustal W program. www.nature.com/scientificreports/ from the tRNA precursors or from the cleavage of mature tRNAs 30 . The tRFs were reported to control gene expression, translation control, transposon control, ncRNA, and DNA damage response 8,30-32 . Although we found ten different chloroplast-derived tRFs, the majority of them belonged to tRNA Glu and tRNA Gly (Supplementary File 8). Among them are the, tRNA Glu are tRF-1 type, tRNA Gly are tRF-5ˈ-type, and tRNA His , tRNA Thr , and tRNA Val are tRF-3ˈ type (Supplementary File 8). Furthermore, we also noted the presence of a few putative tRNA-derived, stress-induced RNA (tiRNAs) fragments (tiRFs) in the chloroplast genome. The majority of the tiRFs were from tRNA Lys (UUU). For the first time, tiRFs were reported in the human fetus hepatic tissue and osteosarcoma cells 33,34 . These tiRFs could be generated in the cell under different stress conditions via cleavage of mature tRNAs 33 . However, their presence as independent nucleotide fragments in the annotated genome Figure 9. Phylogenetic tree of a putative quadruplet anticodon containing tRNAs with triplet codoncontaining tRNAs. The phylogenetic grouping revealed that the quadruplet anticodons had evolved via the addition of a nucleotide preceding the third nucleotide of the triplet anticodons. The evolutionary history was inferred by using the Maximum Likelihood method based on the Tamura-Nei model. The tree with the highest log likelihood (−1053.93) is shown. Initial tree(s) for the heuristic search were obtained automatically by applying the Neighbor-Join and BioNJ algorithms to a matrix of pair-wise distances, estimated by using the Maximum Composite Likelihood (MCL) approach and then selecting the topology with the superior log likelihood value. The tree was drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 147 nucleotide sequences. All positions with less than 95% site coverage were eliminated; that is, fewer than 5% alignment gaps, missing data, and ambiguous bases were allowed at any position. There were a total of 26 positions in the final dataset. Evolutionary analyses were conducted using the MEGA 7. www.nature.com/scientificreports/ sequence reflected their independent presence in the genome. Although the cleavage of tRNAs to tiRFs was brought about by the enzyme angiogenin (an RNase superfamily) 34 in the human cell, its counterpart in plants needs to be identified to understand its detailed functions. The 5ˈ-tiRNA Ala and tiRNA Cys were reported to inhibit translation in rabbit reticulocytes 34 suggesting their inhibitory role in protein translation. This study also found a putative novel tRNA structure encoded by the chloroplast genome (Fig. 4). The tRNA Gly (UCC) was found to contain a long nucleotide sequence between the D-arm and anticodon arm in several species. This long arm could most probably be an intron that might have been incorporated in between these two arms. The chloroplast tRNAs, which had lost the pseudouridine loop (Ψ), seemed to be metazoan mitochondrial-specific (Fig. 5). The loss of the Ψ-loop in tRNA was first reported in the 1970s [35][36][37] . Previous studies also reported the loss of the Ψ-arm and loop in nematode mitochondrial tRNA 37 . However, in the nematode mitochondrial tRNA, the Ψ-arm and loop were present in the tRNA Ser (GCU), whereas it had lost the Ψ-arm and loop in tRNA Ser (GCU and GGA) in the chloroplast genome (Fig. 5). The elongation factor (EF) Tu combined with GTP to form a complex that delivered the amino-acyl tRNA to the ribosome A site through binding of the acceptor,s arm and Ψ-arm 38 . In the absence of the Ψ-arm and loop in the tRNAs, it might use some alternative binding mode for EF-Tu 39,40 . Caenorhabditis elegans mitochondrial EF-Tu, it has around 60 amino acid extensions at the C-terminal end that might play an essential role in binding tRNAs that lack the Ψ-arm 41,42 . This also suggested that the mitochondrial ribosomal protein might have alternate binding sites for the truncated tRNA. Furthermore, the metazoan, mitochondria-specific, truncated tRNA in the chloroplast genome suggested that these tRNA genes might be shared by sub-cellular organelle chloroplast and mitochondria.
Evolutionary analysis revealed that chloroplast tRNAs are derived from multiple common ancestors (Fig. 8). The phylogenetic tree of the chloroplast tRNA shows two distinct clusters, which reflect their evolution from multiple common ancestors. In cluster I, anticodons GCC, CGU, CGA, UCU, CAA, and UAG make more than one group, whereas none of the anticodons from cluster II are found to make more than one group (Fig. 8). The anticodons GCC, GCU, UUC, CAU, and GAA are also found in both clusters (Fig. 8). This suggests that tRNAs with anticodons GCC, CGU, CGA, UCU, CAA, and UAG, of cluster I may have undergone vivid duplication and produced more than one anticodon group.

Conclusions
Chloroplast is a semiautonomous organelle of the plant and protist kingdom with a great potential to encode its own genome and protein translation machinery. The important tRNA molecules required for the protein translation process are well documented. The chloroplast genome encodes putative duplet, triplet, and quadruplet anticodons suggesting their role in recognizing duplet, triplet, and quadruplet codons in the mRNA. Mycoparasitic plants have lost their chloroplast genome to a large extent, thereby losing several chloroplast-encoded tRNA genes. Further, several of the chloroplast-encoded tRNA genes were found to encode introns, and the presence of intron in the chloroplasts genome suggests the presence of introns in the gene of their prokaryotic ancestor cyanobacteria. Further, the chloroplast genome is very selective and encoded only a few isoacceptors abundantly, while GCG, CUG, CUC, CCC, CGG, and ACU anticodons were found to be the rarest form of anticodons in the chloroplast genome. It is important to understand why chloroplast genomes do not encode tRNA with such anticodons. The tRNAs with quadruplet anticodons will enable us to provide a platform for the synthetic biologist to engineer tRNAs with quadruplet anticodons to understand the expansion of quadruplet genetic code.

Materials and methods
All the chloroplast genomes were downloaded from the National Center for Biotechnology Information (NCBI) database. In total, 5959 chloroplast genomes were used in this study. The downloaded chloroplast genomes were subjected to tRNA annotation. tRNA annotation was conducted using tRNAscan-SE 2.0, Aragorn, and the GeSeq-Annotation of the organellar genomes [43][44][45] . The Linux-based approach was used to annotate the chloroplast tRNA for tRNAscan-SE 2.0 and Aragorn. In the GenSeq-annotation of the organellar genome, the chloroplast genome files were uploaded with the following parameters; sequence source: plastid; annotation option: Annotate plastid inverted repeats; blat search: Default; annotate: CDS, tRNA, and rRNA; and third-party tRNA annotator: Aragorn v1.2.38, tRNAscan-SE v2.0.7. All the tRNA sequences generated from these three annotation pipelines were corroborated and used for further analysis. All the data obtained from tRNAscan-SE and Aragorn were further processed in an excel worksheet. The Organellar Genome Draw (OGDRAW) was used to draw the organellar genome map of the chloroplast genome 46 . The Genbank file was used to draw the chloroplast genome map in OGDRAW 46 .
Multiple sequence alignment. The intron sequences retrieved from the chloroplast tRNA were aligned to find the possible conserved structure. Multiple sequence alignment was conducted using the Multalin software (http:// multa lin. toulo use. inra. fr/ multa lin/) that uses hierarchical clustering 47 . Default parameters were used to construct the alignment.
Machine learning approach and statistical analysis. A machine learning approach was used to understand the role of the genome size and GC% content in the number of tRNA genes in the chloroplast genome. The random forest regression approach was used for this purpose. The following parameters were used in the random forest analysis: target tRNA gene number, predictor's genome size, and GC% content; Plots: data split, out-of-bag error, predictive performance, the mean decrease in accuracy, and the total increase in node purity; tables: evaluation matrix; data split preference: sample 20% of all data; training and validation of data: 20% validation data. The training parameters were as follows; training data used per tree: 50%; predictor per split: auto; and the max tree: 100%. The machine-learning approach was studied using the JASP software version www.nature.com/scientificreports/ 0.16.1.0 48 . The correlation plot for GC% content and tRNA was also conducted using the JASP 0.16.1.0 software. The following parameters were used for the correlation analysis, sample correlation coefficient: Pearson's r and confidence interval: 95% (p < 0.05) 48 .
Phylogenetic tree. The tRNA sequences of the chloroplast genomes were taken to construct the phylogenetic tree. The phylogenetic tree was constructed using the Clustalw program (version 2.1) in a Linux-based environment. A neighbor-joining tree was constructed with 100 bootstrap replicates. The resulting file was saved in nwk file format and later uploaded in the iTOL Interactive Tree of life (version 6) to view tree 49 . The phylogenetic tree of the tRNA quadruplet anticodons, with other anticodons, was constructed using the MEGA software version 7 50 . Prior to the construction of the phylogenetic tree, the tRNA sequences were subjected to multiple sequence alignments. Multiple sequence alignments were conducted using the MUSCLE software version 1 51 . The resulting clustal file was converted to the MEGA file format (aln) using the MEGA 7 software 50 . The converted file was subjected to construct the phylogenetic tree in the MEGA 7 software, using the maximumlikelihood approach. The phylogenetic tree of the tRNA introns was also constructed using the MEGA 7 software with the same statistical parameters 50 . The Tamura-Nei model, with 500-bootstrap replicates was used for the analysis.

Data availability
All the data used during this study was taken from National Center for Biotechnology Information database, and all the data are available in the public domain. Also, the accession numbers are provided in the supplementary files.